{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "# Automating surface water mapping with AI tools\n",
    "\n",
    "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/opengeos/geoai/blob/main/docs/workshops/AWS_2025.ipynb)\n",
    "\n",
    "This notebook is developed for the Australian Water School premium webinar on _**automating surface water mapping with AI tools**_ on July 23, 2025. To register for the webinar, please visit <https://awschool.com.au/training/ai-tools-for-mapping>.\n",
    "\n",
    "More resources:\n",
    "\n",
    "- **GitHub**: <https://github.com/opengeos/geoai>\n",
    "- **Documentation**: <https://opengeoai.org>\n",
    "- **Notebook**: <https://opengeoai.org/workshops/GeoAI_Workshop_2025>  \n",
    "- **Web App**: <https://huggingface.co/spaces/giswqs/surface-water-app>\n",
    "- **Web App Demo**: <https://youtu.be/LKySb6OYU7M>\n",
    "- **YouTube Tutorials**: <https://tinyurl.com/GeoAI-Tutorials>\n",
    "\n",
    "## Use Colab GPU\n",
    "\n",
    "To use GPU, please click the \"Runtime\" menu and select \"Change runtime type\". Then select \"T4 GPU\" from the dropdown menu.\n",
    "\n",
    "## Install packages\n",
    "\n",
    "Uncomment the following cell to install the package. It may take a few minutes to install the package. Please be patient."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %pip install geoai-py"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2",
   "metadata": {},
   "source": [
    "## Import libraries\n",
    "\n",
    "Next, we need to install the `geoai` package."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3",
   "metadata": {},
   "outputs": [],
   "source": [
    "import geoai"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4",
   "metadata": {},
   "source": [
    "## Surface water mapping with non-georeferenced satellite imagery\n",
    "\n",
    "In the first part of this notebook, we will demonstrate how to map surface water using non-georeferenced satellite imagery in jpg/png format. \n",
    "\n",
    "### Download sample data\n",
    "\n",
    "We'll use the [waterbody dataset](https://www.kaggle.com/datasets/franciscoescobar/satellite-images-of-water-bodies) from Kaggle. You will need to create an account and download the dataset. I have already downloaded the dataset and saved a copy on Hugging Face. Let's download the dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5",
   "metadata": {},
   "outputs": [],
   "source": [
    "url = \"https://huggingface.co/datasets/giswqs/geospatial/resolve/main/waterbody-dataset.zip\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6",
   "metadata": {},
   "outputs": [],
   "source": [
    "out_folder = geoai.download_file(url)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7",
   "metadata": {},
   "source": [
    "The unzipped dataset contains two folders: `images` and `masks`. Each folder contains 2,841 images in jpg format. The `images` folder contains the original satellite imagery, and the `masks` folder contains the corresponding surface water masks. Note that the image size varies greatly.\n",
    "\n",
    "We will use these 2,841 image pairs to train a surface water mapping model."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8",
   "metadata": {},
   "source": [
    "### Train semantic segmentation model\n",
    "\n",
    "Before diving into the training, let's understand the basic concepts of deep learning architectures and encoders.\n",
    "\n",
    "Deep learning architecture is like the **blueprint** for a neural network — it defines how the network is built and how data flows through it. It includes layers of nodes (neurons) that process the data step by step to learn patterns, like identifying cats in pictures or translating languages.\n",
    "\n",
    "There are many types of architectures:\n",
    "\n",
    "* **Feedforward Neural Networks** (simple, goes one way)\n",
    "* **Convolutional Neural Networks (CNNs)** (used for images)\n",
    "* **Recurrent Neural Networks (RNNs)** (used for sequences like speech)\n",
    "* **Transformers** (used for language tasks, like ChatGPT)\n",
    "\n",
    "Each one is designed for specific types of problems.\n",
    "\n",
    "\n",
    "An **encoder** is a part of the neural network that takes the input (like a sentence or image) and compresses it into a smaller, meaningful form called a **feature representation** or **embedding**. It captures the important information while throwing away the noise.\n",
    "\n",
    "For example, if you feed the sentence “I love pizza” into the encoder, it turns that sentence into a set of numbers that still represent its meaning but are easier for the computer to understand and work with.\n",
    "\n",
    "Encoders are used in models like:\n",
    "\n",
    "* **Autoencoders** (for compressing and reconstructing data)\n",
    "* **Transformer Encoders** (like BERT, for understanding language)\n",
    "* **Encoder-Decoder models** (like translation systems)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9",
   "metadata": {},
   "source": [
    "Now we'll train a semantic segmentation model using the new `train_segmentation_model` function. This function supports various architectures from `segmentation-models-pytorch`:\n",
    "\n",
    "- **Architectures**: `unet`, `unetplusplus` `deeplabv3`, `deeplabv3plus`, `fpn`, `pspnet`, `linknet`, `manet`\n",
    "- **Encoders**: `resnet34`, `resnet50`, `efficientnet-b0`, `mobilenet_v2`, etc.\n",
    "\n",
    "For more details, please refer to the [segmentation-models-pytorch documentation](https://smp.readthedocs.io/en/latest/models.html).\n",
    "\n",
    "Let's train the module using U-Net with ResNet34 encoder:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Test train_segmentation_model with automatic size detection\n",
    "geoai.train_segmentation_model(\n",
    "    images_dir=f\"{out_folder}/images\",\n",
    "    labels_dir=f\"{out_folder}/masks\",\n",
    "    output_dir=f\"{out_folder}/unet_models\",\n",
    "    architecture=\"unet\",\n",
    "    encoder_name=\"resnet34\",\n",
    "    encoder_weights=\"imagenet\",\n",
    "    num_channels=3,  # number of channels in the input image\n",
    "    num_classes=2,  # background and water\n",
    "    batch_size=32,  # The number of images to process in each batch\n",
    "    num_epochs=3,  # training for 3 epochs to save time, in practice you should train for more epochs\n",
    "    learning_rate=0.001,  # learning rate for the optimizer\n",
    "    val_split=0.2,  # 20% of the data for validation\n",
    "    target_size=(512, 512),  # target size of the input image\n",
    "    verbose=True,  # print progress\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11",
   "metadata": {},
   "source": [
    "In the model output folder `unet_models`, you will find the following files:\n",
    "\n",
    "- `best_model.pth`: The best model checkpoint\n",
    "- `final_model.pth`: The last model checkpoint\n",
    "- `training_history.pth`: The training history\n",
    "- `training_summary.txt`: The training summary"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12",
   "metadata": {},
   "source": [
    "### Evaluate the model\n",
    "\n",
    "Let's examine the training curves and model performance:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "13",
   "metadata": {},
   "outputs": [],
   "source": [
    "geoai.plot_performance_metrics(\n",
    "    history_path=f\"{out_folder}/unet_models/training_history.pth\",\n",
    "    figsize=(15, 5),\n",
    "    verbose=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "14",
   "metadata": {},
   "source": [
    "![image](https://github.com/user-attachments/assets/381ce436-3520-4706-9def-b0a7ae8244ac)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "15",
   "metadata": {},
   "source": [
    "### Run inference on a single image\n",
    "\n",
    "You can run inference on a new image using the `semantic_segmentation` function. I don't have a new image to test on, so I'll use one of the training images. In reality, you would use your own images not used in training."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "16",
   "metadata": {},
   "outputs": [],
   "source": [
    "index = 3\n",
    "test_image_path = f\"{out_folder}/images/water_body_{index}.jpg\"\n",
    "ground_truth_path = f\"{out_folder}/masks/water_body_{index}.jpg\"\n",
    "prediction_path = f\"{out_folder}/prediction/water_body_{index}.png\"  # save as png to preserve exact values and avoid compression artifacts\n",
    "model_path = f\"{out_folder}/unet_models/best_model.pth\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "17",
   "metadata": {},
   "outputs": [],
   "source": [
    "geoai.semantic_segmentation(\n",
    "    input_path=test_image_path,\n",
    "    output_path=prediction_path,\n",
    "    model_path=model_path,\n",
    "    architecture=\"unet\",\n",
    "    encoder_name=\"resnet34\",\n",
    "    num_channels=3,\n",
    "    num_classes=2,\n",
    "    window_size=512,\n",
    "    overlap=256,\n",
    "    batch_size=32,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "18",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig = geoai.plot_prediction_comparison(\n",
    "    original_image=test_image_path,\n",
    "    prediction_image=prediction_path,\n",
    "    ground_truth_image=ground_truth_path,\n",
    "    titles=[\"Original\", \"Prediction\", \"Ground Truth\"],\n",
    "    figsize=(15, 5),\n",
    "    save_path=f\"{out_folder}/prediction/water_body_{index}_comparison.png\",\n",
    "    show_plot=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19",
   "metadata": {},
   "source": [
    "![image](https://github.com/user-attachments/assets/00308228-0819-4161-9a35-6a98f4cefa93)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20",
   "metadata": {},
   "source": [
    "### Run inference on multiple images\n",
    "\n",
    "First, let's download the test images and masks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "21",
   "metadata": {},
   "outputs": [],
   "source": [
    "url = \"https://huggingface.co/datasets/giswqs/geospatial/resolve/main/waterbody-dataset-sample.zip\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "22",
   "metadata": {},
   "outputs": [],
   "source": [
    "data_dir = geoai.download_file(url)\n",
    "images_dir = f\"{data_dir}/images\"\n",
    "masks_dir = f\"{data_dir}/masks\"\n",
    "predictions_dir = f\"{data_dir}/predictions\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "23",
   "metadata": {},
   "outputs": [],
   "source": [
    "geoai.semantic_segmentation_batch(\n",
    "    input_dir=images_dir,\n",
    "    output_dir=predictions_dir,\n",
    "    model_path=model_path,\n",
    "    architecture=\"unet\",\n",
    "    encoder_name=\"resnet34\",\n",
    "    num_channels=3,\n",
    "    num_classes=2,\n",
    "    window_size=512,\n",
    "    overlap=256,\n",
    "    batch_size=4,\n",
    "    quiet=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "24",
   "metadata": {},
   "source": [
    "## Surface water mapping with Sentinel-2 imagery\n",
    "\n",
    "In the second part of this notebook, we will demonstrate how to map surface water using Sentinel-2 imagery with six spectral bands, including blue, green, red, near-infrared, and short-wave infrared bands.\n",
    "\n",
    "### Download sample data\n",
    "\n",
    "We'll use the [Earth Surface Water Dataset](https://zenodo.org/records/5205674#.Y4iEFezP1hE) from Zenodo. Credits to the author (Xin Luo) of the dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "25",
   "metadata": {},
   "outputs": [],
   "source": [
    "url = \"https://zenodo.org/records/5205674/files/dset-s2.zip?download=1\"\n",
    "data_dir = geoai.download_file(url)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26",
   "metadata": {},
   "source": [
    "In the unzipped dataset, we have four folders:\n",
    "\n",
    "- `dset-s2/tra_scene`: training images\n",
    "- `dset-s2/tra_truth`: training masks\n",
    "- `dset-s2/val_scene`: validation images\n",
    "- `dset-s2/val_truth`: validation masks\n",
    "\n",
    "We will use the training images and masks to train a semantic segmentation model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "27",
   "metadata": {},
   "outputs": [],
   "source": [
    "images_dir = f\"{data_dir}/dset-s2/tra_scene\"\n",
    "masks_dir = f\"{data_dir}/dset-s2/tra_truth\"\n",
    "tiles_dir = f\"{data_dir}/dset-s2/tiles\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "28",
   "metadata": {},
   "source": [
    "### Create training data\n",
    "\n",
    "We'll create the same training tiles as before."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "29",
   "metadata": {},
   "outputs": [],
   "source": [
    "result = geoai.export_geotiff_tiles_batch(\n",
    "    images_folder=images_dir,\n",
    "    masks_folder=masks_dir,\n",
    "    output_folder=tiles_dir,\n",
    "    tile_size=512,\n",
    "    stride=128,\n",
    "    quiet=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "30",
   "metadata": {},
   "source": [
    "### Train semantic segmentation model\n",
    "\n",
    "Now we'll train a semantic segmentation model using the new `train_segmentation_model` function. Let's train the module using U-Net with ResNet34 encoder:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "31",
   "metadata": {},
   "outputs": [],
   "source": [
    "geoai.train_segmentation_model(\n",
    "    images_dir=f\"{tiles_dir}/images\",\n",
    "    labels_dir=f\"{tiles_dir}/masks\",\n",
    "    output_dir=f\"{tiles_dir}/unet_models\",\n",
    "    architecture=\"unet\",\n",
    "    encoder_name=\"resnet34\",\n",
    "    encoder_weights=\"imagenet\",\n",
    "    num_channels=6,\n",
    "    num_classes=2,  # background and water\n",
    "    batch_size=32,\n",
    "    num_epochs=5,  # training for 5 epochs to save time, in practice you should train for more epochs\n",
    "    learning_rate=0.001,\n",
    "    val_split=0.2,\n",
    "    verbose=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "32",
   "metadata": {},
   "source": [
    "### Evaluate the model\n",
    "\n",
    "Let's examine the training curves and model performance:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "33",
   "metadata": {},
   "outputs": [],
   "source": [
    "geoai.plot_performance_metrics(\n",
    "    history_path=f\"{tiles_dir}/unet_models/training_history.pth\",\n",
    "    figsize=(15, 5),\n",
    "    verbose=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "34",
   "metadata": {},
   "source": [
    "![image](https://github.com/user-attachments/assets/61f675a7-ee67-4650-81c0-f754fe681f4d)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "35",
   "metadata": {},
   "source": [
    "### Run inference"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "36",
   "metadata": {},
   "outputs": [],
   "source": [
    "images_dir = f\"{data_dir}/dset-s2/val_scene\"\n",
    "masks_dir = f\"{data_dir}/dset-s2/val_truth\"\n",
    "predictions_dir = f\"{data_dir}/dset-s2/predictions\"\n",
    "model_path = f\"{tiles_dir}/unet_models/best_model.pth\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "37",
   "metadata": {},
   "outputs": [],
   "source": [
    "geoai.semantic_segmentation_batch(\n",
    "    input_dir=images_dir,\n",
    "    output_dir=predictions_dir,\n",
    "    model_path=model_path,\n",
    "    architecture=\"unet\",\n",
    "    encoder_name=\"resnet34\",\n",
    "    num_channels=6,\n",
    "    num_classes=2,\n",
    "    window_size=512,\n",
    "    overlap=256,\n",
    "    batch_size=32,\n",
    "    quiet=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "38",
   "metadata": {},
   "source": [
    "### Visualize results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "39",
   "metadata": {},
   "outputs": [],
   "source": [
    "test_image_path = (\n",
    "    f\"{data_dir}/dset-s2/val_scene/S2A_L2A_20190318_N0211_R061_6Bands_S2.tif\"\n",
    ")\n",
    "ground_truth_path = (\n",
    "    f\"{data_dir}/dset-s2/val_truth/S2A_L2A_20190318_N0211_R061_S2_Truth.tif\"\n",
    ")\n",
    "prediction_path = (\n",
    "    f\"{data_dir}/dset-s2/predictions/S2A_L2A_20190318_N0211_R061_6Bands_S2_mask.tif\"\n",
    ")\n",
    "save_path = f\"{data_dir}/dset-s2/S2A_L2A_20190318_N0211_R061_6Bands_S2_comparison.png\"\n",
    "\n",
    "fig = geoai.plot_prediction_comparison(\n",
    "    original_image=test_image_path,\n",
    "    prediction_image=prediction_path,\n",
    "    ground_truth_image=ground_truth_path,\n",
    "    titles=[\"Original\", \"Prediction\", \"Ground Truth\"],\n",
    "    figsize=(15, 5),\n",
    "    save_path=save_path,\n",
    "    show_plot=True,\n",
    "    indexes=[5, 4, 3],\n",
    "    divider=5000,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "40",
   "metadata": {},
   "source": [
    "![image](https://github.com/user-attachments/assets/53601ed7-2bd6-4e7e-b369-4d7bfc2ce120)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "41",
   "metadata": {},
   "source": [
    "### Download Sentinel-2 imagery"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "42",
   "metadata": {},
   "outputs": [],
   "source": [
    "m = geoai.LeafMap(center=[-16.3043, 128.7412], zoom=10)\n",
    "m.add_basemap(\"Esri.WorldImagery\")\n",
    "m.add_stac_gui()\n",
    "m"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "43",
   "metadata": {},
   "outputs": [],
   "source": [
    "# m.stac_gdf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "44",
   "metadata": {},
   "outputs": [],
   "source": [
    "# m.stac_item"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "45",
   "metadata": {},
   "outputs": [],
   "source": [
    "import leafmap"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "46",
   "metadata": {},
   "outputs": [],
   "source": [
    "url = \"https://earth-search.aws.element84.com/v1/\"\n",
    "collection = \"sentinel-2-l2a\"\n",
    "time_range = \"2025-01-01/2025-07-20\"\n",
    "bbox = [128.6735, -16.2466, 128.9577, -16.0962]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "47",
   "metadata": {},
   "outputs": [],
   "source": [
    "search = leafmap.stac_search(\n",
    "    url=url,\n",
    "    max_items=10,\n",
    "    collections=[collection],\n",
    "    bbox=bbox,\n",
    "    datetime=time_range,\n",
    "    query={\"eo:cloud_cover\": {\"lt\": 10}},\n",
    "    sortby=[{\"field\": \"properties.eo:cloud_cover\", \"direction\": \"asc\"}],\n",
    "    get_collection=True,\n",
    ")\n",
    "search"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "48",
   "metadata": {},
   "outputs": [],
   "source": [
    "search = leafmap.stac_search(\n",
    "    url=url,\n",
    "    max_items=10,\n",
    "    collections=[collection],\n",
    "    bbox=bbox,\n",
    "    datetime=time_range,\n",
    "    query={\"eo:cloud_cover\": {\"lt\": 10}},\n",
    "    sortby=[{\"field\": \"properties.eo:cloud_cover\", \"direction\": \"asc\"}],\n",
    "    get_gdf=True,\n",
    ")\n",
    "search.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "49",
   "metadata": {},
   "outputs": [],
   "source": [
    "search = leafmap.stac_search(\n",
    "    url=url,\n",
    "    max_items=1,\n",
    "    collections=[collection],\n",
    "    bbox=bbox,\n",
    "    datetime=time_range,\n",
    "    query={\"eo:cloud_cover\": {\"lt\": 10}},\n",
    "    sortby=[{\"field\": \"properties.eo:cloud_cover\", \"direction\": \"asc\"}],\n",
    "    get_assets=True,\n",
    ")\n",
    "search"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "50",
   "metadata": {},
   "outputs": [],
   "source": [
    "bands = [\"blue\", \"green\", \"red\", \"nir\", \"swir16\", \"swir22\"]\n",
    "assets = list(search.values())[0]\n",
    "links = [assets[band] for band in bands]\n",
    "for link in links:\n",
    "    print(link)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "51",
   "metadata": {},
   "outputs": [],
   "source": [
    "out_dir = \"s2\"\n",
    "leafmap.download_files(links, out_dir)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "52",
   "metadata": {},
   "source": [
    "### Stack image bands\n",
    "\n",
    "Uncomment the following cell to install GDAL on Colab."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "53",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !apt-get install -y gdal-bin"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "54",
   "metadata": {},
   "outputs": [],
   "source": [
    "s2_path = \"s2.tif\"\n",
    "geoai.stack_bands(input_files=out_dir, output_file=s2_path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "55",
   "metadata": {},
   "outputs": [],
   "source": [
    "geoai.view_raster(s2_path, indexes=[4, 3, 2])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "56",
   "metadata": {},
   "source": [
    "### Run inference on a Sentinel-2 image"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "57",
   "metadata": {},
   "outputs": [],
   "source": [
    "s2_mask = \"s2_mask.tif\"\n",
    "model_path = f\"{tiles_dir}/unet_models/best_model.pth\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "58",
   "metadata": {},
   "outputs": [],
   "source": [
    "geoai.semantic_segmentation(\n",
    "    input_path=s2_path,\n",
    "    output_path=s2_mask,\n",
    "    model_path=model_path,\n",
    "    architecture=\"unet\",\n",
    "    encoder_name=\"resnet34\",\n",
    "    num_channels=6,\n",
    "    num_classes=2,\n",
    "    window_size=512,\n",
    "    overlap=256,\n",
    "    batch_size=32,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "59",
   "metadata": {},
   "source": [
    "### Visualize the results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "60",
   "metadata": {},
   "outputs": [],
   "source": [
    "geoai.view_raster(\n",
    "    s2_mask, no_data=0, colormap=\"viridis\", basemap=s2_path, backend=\"ipyleaflet\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "61",
   "metadata": {},
   "source": [
    "\n",
    "## Surface water mapping with aerial imagery\n",
    "\n",
    "In the last part of this notebook, we will demonstrate how to map surface water using aerial imagery from the USDA National Agriculture Imagery Program (NAIP).\n",
    "\n",
    "### Download sample data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "62",
   "metadata": {},
   "outputs": [],
   "source": [
    "train_raster_url = \"https://huggingface.co/datasets/giswqs/geospatial/resolve/main/naip/naip_water_train.tif\"\n",
    "train_masks_url = \"https://huggingface.co/datasets/giswqs/geospatial/resolve/main/naip/naip_water_masks.tif\"\n",
    "test_raster_url = \"https://huggingface.co/datasets/giswqs/geospatial/resolve/main/naip/naip_water_test.tif\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "63",
   "metadata": {},
   "outputs": [],
   "source": [
    "train_raster_path = geoai.download_file(train_raster_url)\n",
    "train_masks_path = geoai.download_file(train_masks_url)\n",
    "test_raster_path = geoai.download_file(test_raster_url)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "64",
   "metadata": {},
   "outputs": [],
   "source": [
    "geoai.print_raster_info(train_raster_path, show_preview=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "65",
   "metadata": {},
   "source": [
    "### Visualize sample data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "66",
   "metadata": {},
   "outputs": [],
   "source": [
    "geoai.view_raster(train_masks_url, nodata=0, basemap=train_raster_url)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "67",
   "metadata": {},
   "outputs": [],
   "source": [
    "geoai.view_raster(test_raster_url)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "68",
   "metadata": {},
   "source": [
    "### Create training data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "69",
   "metadata": {},
   "outputs": [],
   "source": [
    "out_folder = \"naip\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "70",
   "metadata": {},
   "outputs": [],
   "source": [
    "tiles = geoai.export_geotiff_tiles(\n",
    "    in_raster=train_raster_path,\n",
    "    out_folder=out_folder,\n",
    "    in_class_data=train_masks_path,\n",
    "    tile_size=512,\n",
    "    stride=128,\n",
    "    buffer_radius=0,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "71",
   "metadata": {},
   "source": [
    "### Train segmentation model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "72",
   "metadata": {},
   "outputs": [],
   "source": [
    "geoai.train_segmentation_model(\n",
    "    images_dir=f\"{out_folder}/images\",\n",
    "    labels_dir=f\"{out_folder}/labels\",\n",
    "    output_dir=f\"{out_folder}/models\",\n",
    "    architecture=\"unet\",\n",
    "    encoder_name=\"resnet34\",\n",
    "    encoder_weights=\"imagenet\",\n",
    "    num_channels=4,\n",
    "    pretrained=True,\n",
    "    batch_size=8,\n",
    "    num_epochs=5,\n",
    "    learning_rate=0.005,\n",
    "    val_split=0.2,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "73",
   "metadata": {},
   "source": [
    "### Evaluate the model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "74",
   "metadata": {},
   "outputs": [],
   "source": [
    "geoai.plot_performance_metrics(\n",
    "    history_path=f\"{out_folder}/models/training_history.pth\",\n",
    "    figsize=(15, 5),\n",
    "    verbose=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "75",
   "metadata": {},
   "source": [
    "### Run inference"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "76",
   "metadata": {},
   "outputs": [],
   "source": [
    "masks_path = \"naip_water_prediction.tif\"\n",
    "model_path = f\"{out_folder}/models/best_model.pth\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "77",
   "metadata": {},
   "outputs": [],
   "source": [
    "geoai.semantic_segmentation(\n",
    "    test_raster_path,\n",
    "    masks_path,\n",
    "    model_path,\n",
    "    architecture=\"unet\",\n",
    "    encoder_name=\"resnet34\",\n",
    "    encoder_weights=\"imagenet\",\n",
    "    window_size=512,\n",
    "    overlap=128,\n",
    "    confidence_threshold=0.3,\n",
    "    batch_size=32,\n",
    "    num_channels=4,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "78",
   "metadata": {},
   "source": [
    "### Vectorize masks"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "79",
   "metadata": {},
   "outputs": [],
   "source": [
    "output_path = \"naip_water_prediction.geojson\"\n",
    "gdf = geoai.raster_to_vector(\n",
    "    masks_path, output_path, min_area=1000, simplify_tolerance=1\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "80",
   "metadata": {},
   "outputs": [],
   "source": [
    "gdf = geoai.add_geometric_properties(gdf)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "81",
   "metadata": {},
   "outputs": [],
   "source": [
    "len(gdf)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "82",
   "metadata": {},
   "outputs": [],
   "source": [
    "geoai.view_vector_interactive(gdf, tiles=test_raster_url)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "83",
   "metadata": {},
   "outputs": [],
   "source": [
    "gdf[\"elongation\"].hist()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "84",
   "metadata": {},
   "outputs": [],
   "source": [
    "gdf_filtered = gdf[gdf[\"elongation\"] < 10]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "85",
   "metadata": {},
   "outputs": [],
   "source": [
    "len(gdf_filtered)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "86",
   "metadata": {},
   "source": [
    "### Visualize results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "87",
   "metadata": {},
   "outputs": [],
   "source": [
    "geoai.view_vector_interactive(gdf_filtered, tiles=test_raster_url)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "88",
   "metadata": {},
   "outputs": [],
   "source": [
    "geoai.create_split_map(\n",
    "    left_layer=gdf_filtered,\n",
    "    right_layer=test_raster_url,\n",
    "    left_args={\"style\": {\"color\": \"red\", \"fillOpacity\": 0.2}},\n",
    "    basemap=test_raster_url,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "89",
   "metadata": {},
   "source": [
    "![image](https://github.com/user-attachments/assets/a269b5a0-9f72-4ed8-8b2d-a175bbc45a23)"
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "default_lexer": "ipython3"
  },
  "kernelspec": {
   "display_name": "geo",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
